72 research outputs found

    Motivations and challenges for stream processing in edge computing

    Get PDF
    The 2030 Agenda for Sustainable Development of the United Nations General Assembly defines 17 development goals to be met for a sustainable future. Goals such as Industry, Innovation and Infrastructure and Sustainable Cities and Communities depend on digital systems. As a matter of fact, billions of Euros are invested into digital transformation within the European Union, and many researchers are actively working to push state-of-the-art boundaries for techniques/tools able to extract value and insights from the large amounts of raw data sensed in digital systems. Edge computing aims at supporting such data-to-value transformation. In digital systems that traditionally rely on central data gathering, edge computing proposes to push the analysis towards the devices and data sources, thus leveraging the large cumulative computational power found in modern distributed systems. Some of the ideas promoted in edge computing are not new, though. Continuous and distributed data analysis paradigms such as stream processing have argued about the need for smart distributed analysis for basically 20 years. Starting from this observation, this talk covers a set of standing challenges for smart, distributed, and continuous stream processing in edge computing, with real-world examples and use-cases from smart grids and vehicular networks

    TinTiN: Travelling in time (if necessary) to deal with out-of-order data in streaming aggregation

    Get PDF
    Cyber-Physical Systems (CPS) rely on data stream processing for high-throughput, low-latency analysis with correctness and accuracy guarantees (building on deterministic execution) for monitoring, safety or security applications.The trade-offs in processing performance and results\u27 accuracy are nonetheless application-dependent. While some applications need strict deterministic execution, others can value fast (but possibly approximated) answers.Despite the existing literature on how to relax and trade strict determinism for efficiency or deadlines, we lack a formal characterization of levels of determinism, needed by industries to assess whether or not such trade-offs are acceptable.To bridge the gap, we introduce the notion of D-bounded eventual determinism, where D is the maximum out-of-order delay of the input data.We design and implement TinTiN, a streaming middleware that can be used in combination with user-defined streaming applications, to provably enforce D-bounded eventual determinism.We evaluate TinTiN with a real-world streaming application for Advanced Metering Infrastructure (AMI) monitoring, showing it provides an order of magnitude improvement in processing performance, while minimizing delays in output generation, compared to a state-of-the-art strictly deterministic solution that waits for time proportional to D, for each input tuple, before generating output that depends on it

    Time- and Computation-Efficient Data Localization at Vehicular Networks\u27 Edge

    Get PDF
    As Vehicular Networks rely increasingly on sensed data to enhance functionality and safety, efficient and distributed data analysis is needed to effectively leverage new technologies in real-world applications. Considering the tens of GBs per hour sensed by modern connected vehicles, traditional analysis, based on global data accumulation, can rapidly exhaust the capacity of the underlying network, becoming increasingly costly, slow, or even infeasible. Employing the edge processing paradigm, which aims at alleviating this drawback by leveraging vehicles\u27 computational power, we are the first to study how to localize, efficiently and distributively, relevant data in a vehicular fleet for analysis applications. This is achieved by appropriate methods to spread requests across the fleet, while efficiently balancing the time needed to identify relevant vehicles, and the computational overhead induced on the Vehicular Network. We evaluate our techniques using two large sets of real-world data in a realistic environment where vehicles join or leave the fleet during the distributed data localization process. As we show, our algorithms are both efficient and configurable, outperforming the baseline algorithms by up to a 40 7 speedup while reducing computational overhead by up to 3 7 , while providing good estimates for the fraction of vehicles with relevant data and fairly spreading the workload over the fleet. All code as well as detailed instructions are available at https://github.com/dcs-chalmers/dataloc_vn

    Stream-IT: Continuous and dynamic processing of production systems data - Throughput bottlenecks as a case-study

    Get PDF
    Considering the needs for continuous availability of information out of data generated in Cyber-Physical production systems, we investigate the use of continuous stream processing as a paradigm for generating useful information out of the data, to support efficient and safe operation, as well as planning activities.Our contributions and expected benefits: (i) we show possibilities to automate and pipeline the validation and analysis of the data, hence providing an automated way to improve the quality of the latter and parallelizing the two phases; (ii) we show how to induce lower latency in generating the desired information, enabling it to be continuously made available, before whole batches of data are gathered, in cost-efficient ways; (iii) besides the automation of the above procedures that are commonly done in a batch fashion and with significant manual effort by the production system analysts, we show additional options for configuring ways in which to automate deeper analysis of the data; in particular, we provide evidences about how the rich semantics of stream processing frameworks can ease the development and deployment of data analysis applications in production systems.Moreover, using the problem of bottleneck detection as a sample scenario, we illustrate the above in a concrete fashion, on cost-efficient systems, that are plausible to have in existing deployments. The experimental study is on a 2-year data-set with more than 8.5 million entries, from a system including more than 30 interconnected machines and it demonstrates the benefits of the proposed methods, in providing timely and multidimensional information from the data, enabling possibilities for deeper analyses

    A scalable SIEM correlation engine and its application to the Olympic Games IT infrastructure

    Get PDF
    The security event correlation scalability has become a major concern for security analysts and IT administrators when considering complex IT infrastructures that need to handle gargantuan amounts of events or wide correlation window spans. The current correlation capabilities of Security Information and Event Management (SIEM), based on a single node in centralized servers, have proved to be insufficient to process large event streams. This paper introduces a step forward in the current state of the art to address the aforementioned problems. The proposed model takes into account the two main aspects of this ?eld: distributed correlation and query parallelization. We present a case study of a multiple-step attack on the Olympic Games IT infrastructure to illustrate the applicability of our approach

    Towards data-driven additive manufacturing processes

    Get PDF
    Additive Manufacturing (AM), or 3D printing, is a potential game-changer in medical and aerospatial sectors, among others. AM enables rapid prototyping (allowing development/manufacturing of advanced components in a matter of days), weight reduction, mass customization, and on-demand manufacturing to reduce inventory costs. At present, though, AM has been showcased in many pilot studies but has not reached broad industrial application. Online monitoring and data-driven decision-making are needed to go beyond existing offline and manual approaches. We aim at advancing the state-of-the-art by introducing the STRATA framework. While providing APIs tailored to AM printing processes, STRATA leverages common processing paradigms such as stream processing and key-value stores, enabling both scalable analysis and portability. As we show with a real-world use case, STRATA can support online analysis with sub-second latency for custom data pipelines monitoring several processes in parallel

    Adaptive Stream-based Shifting Bottleneck Detection in IoT-based Computing Architectures

    Get PDF
    Cloud computing is revolutionizing the backbone of data analysis applications, including industrial ones. One of its main pillars is the separation of the logic with which data is accessed (e.g., to study the efficiency of a manufacturing system) from the actual hardware (e.g., server) that maintains and analyses the data. Large distributed cyber-physical systems enabled by, among other technologies, the Internet of Things (IoT), made nonetheless clear that \u27what to do\u27 with the data and \u27where to do it\u27 are not disjoint problems; i.e., cloud computing on its own is not enough. Fog and edge computing have emerged as complementary options, to distribute the analysis, helping with challenges by means of close-to-the-source data analysis.We show for a key problem for industrial processes, that of shifting bottleneck detection, how to take advantage of such multi-tier computing architectures, to perform continuous and configurable analysis of data from Manufacturing Execution Systems. We propose a processing framework, STRATUM, and an algorithm, AMBLE, for continuous, data stream processing. STRATUM seamlessly distributes and parallelizes the processing across the tiers and AMBLE guarantees consistent analysis in spite of timing fluctuations, which are commonly introduced due to e.g. the communication system; it also achieves efficiency through appropriate data structures for in-memory processing. The experimental study on a real-world dataset, taken from a production line over two years and including 8.5 million entries, shows the benefits of the proposed solution in enabling configurable and efficient analysis

    FORTE: an extensible framework for robustness and efficiency in data transfer pipelines

    Get PDF
    In the age of big data and growing product complexity, it is common to monitor many aspects of a product or system, in order to extract well-founded intelligence and draw conclusions, to continue driving innovation. Automating and scaling processes in data-pipelines becomes essential to keep pace with increasing rates of data generated by such practices, while meeting security, governance, scalability and resource-efficiency demands.We present FORTE, an extensible framework for robustness and transfer-efficiency in data pipelines. We identify sources of potential bottlenecks and explore the design space of approaches to deal with the challenges they pose. We study and evaluate synergetic effects of data compression and in-memory processing as well as task scheduling, in association with pipeline performance.A prototype implementation of FORTE is implemented and studied in a use-case at Volvo Trucks for high-volume production-level data sets, in the order of magnitude of hundreds of gigabytes to terabytes per burst. Various general-purpose lossless data compression algorithms are evaluated, in order to balance compression effectiveness and time in the pipeline.All in all, FORTE enables to deal with trade-offs and achieve benefits in latency and sustainable rate (up to 1.8 times better), effectiveness in resource utilisation, all while also enabling additional features such as integrity verification, logging, monitoring and traceability, as well as cataloguing of transferred data. We also note that the resource efficiency improvements achievable with FORTE, and its extensibility, can imply further benefits regarding scheduling, orchestration and energy-efficiency in such pipelines

    LoCoVolt: Distributed Detection of Broken Meters in Smart Grids through Stream Processing

    Get PDF
    Smart Grids and Advanced Metering Infrastructures are rapidly replacing traditional energy grids.The cumulative computational power of their IT devices, which can be leveraged to continuously monitor the state of the grid, is nonetheless vastly underused.This paper provides evidence of the potential of streaming analysis run at smart grid devices.We propose a structural component, which we name \name{} (Local Comparison of Voltages), that is able to detect in a distributed fashion malfunctioning smart meters, which report erroneous information about the power quality. This is achieved by comparing the voltage readings of meters that, because of their proximity in the network, are expected to report readings following similar trends. Having this information can allow utilities to react promptly and thus increase timeliness, quality and safety of their services to society and, implicitly, their business value.As we show, based on our implementation on Apache Flink and the evaluation conducted with resource-constrained hardware (i.e., with capacity similar to that of hardware in smart grids) and data from a real-world network, the streaming paradigm can deliver efficient and effective monitoring tools and thus achieve the desired goals with almost no additional computational cost

    The DEBS 2020 grand challenge

    Get PDF
    The ACM DEBS 2020 Grand Challenge is the tenth in a series of challenges which seek to provide a common ground and evaluation criteria for a competition aimed at both research and industrial event-based systems. The focus of the ACM DEBS 2020 Grand Challenge is on Non-Intrusive Load Monitoring (NILM). The goal of the challenge is to detect when appliances contributing to an aggregated stream of voltage and current readings from a smart meter are switched on or off. NILM is leveraged in many contexts, ranging from monitoring of energy consumption to home automation. This paper describes the specifics of the data streams provided in the challenge, as well as the benchmarking platform that supports the testing of the solutions submitted by the participants
    corecore